NSF PAR Search | NSF Public Access Repository

Recent advancements in deep learning-based wearable human action recognition (wHAR) have improved the capture and classification of complex motions, but adoption remains limited due to the lack of expert annotations and domain discrepancies from user variations. Limited annotations hinder the model's ability to generalize to out-of-distribution samples. While data augmentation can improve generalizability, unsupervised augmentation techniques must be applied carefully to avoid introducing noise. Unsupervised domain adaptation (UDA) addresses domain discrepancies by aligning conditional distributions with labeled target samples, but vanilla pseudo-labeling can lead to error propagation. To address these challenges, we propose μDAR, a novel joint optimization architecture comprised of three functions: (i) consistency regularizer between augmented samples to improve model classification generalizability, (ii) temporal ensemble for robust pseudo-label generation and (iii) conditional distribution alignment to improve domain generalizability. The temporal ensemble works by aggregating predictions from past epochs to smooth out noisy pseudo-label predictions, which are then used in the conditional distribution alignment module to minimize kernel-based class-wise conditional maximum mean discrepancy (kCMMD) between the source and target feature space to learn a domain invariant embedding. The consistency-regularized augmentations ensure that multiple augmentations of the same sample share the same labels; this results in (a) strong generalization with limited source domain samples and (b) consistent pseudo-label generation in target samples. The novel integration of these three modules in μDAR results in a range of ~ 4-12% average macro-F1 score improvement over six state-of-the-art UDA methods in four benchmark wHAR datasets.

AcouDL: Context-Aware Daily Activity Recognition from Natural Acoustic Signals

https://doi.org/10.1109/SMARTCOMP61445.2024.00077

Chakma, Avijoy; Das, Anirban; Md_Faridee, Abu Zaher; Chakraborty, Suchetana; Chakraborty, Sandip; Roy, Nirmalya (June 2024, IEEE)

The ubiquitousness of smart and wearable devices with integrated acoustic sensors in modern human lives presents tremendous opportunities for recognizing human activities in our living spaces through ML-driven applications. However, their adoption is often hindered by the requirement of large amounts of labeled data during the model training phase. Integration of contextual metadata has the potential to alleviate this since the nature of these meta-data is often less dynamic (e.g. cleaning dishes, and cooking both can happen in the kitchen context) and can often be annotated in a less tedious manner (a sensor always placed in the kitchen). However, most models do not have good provisions for the integration of such meta-data information. Often, the additional metadata is leveraged in the form of multi-task learning with sub-optimal outcomes. On the other hand, reliably recognizing distinct in-home activities with similar acoustic patterns (e.g. chopping, hammering, knife sharpening) poses another set of challenges. To mitigate these challenges, we first show in our preliminary study that the room acoustics properties such as reverberation, room materials, and background noise leave a discernible fingerprint in the audio samples to recognize the room context and proposed AcouDL as a unified framework to exploit room context information to improve activity recognition performance. Our proposed self-supervision-based approach first learns the context features of the activities by leveraging a large amount of unlabeled data using a contrastive learning mechanism and then incorporates this feature induced with a novel attention mechanism into the activity classification pipeline to improve the activity recognition performance. Extensive evaluation of AcouDL on three datasets containing a wide range of activities shows that such an efficient feature fusion-mechanism enables the incorporation of metadata that helps to better recognition of the activities under challenging classification scenarios with 0.7-3.5% macro F1 score improvement over the baselines.

Full Text Available

Search for: All records